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The goal of this work is to study the large sample properties of 
the posterior-based inference in the curved exponential family un- 
der increasing dimension. The curved structure arises from the im- 
position of various restrictions, such as moment restrictions, on the 
model, and plays a fundamental role in various branches of data anal- 
ysis. We establish conditions under which the posterior distribution 
is approximately normal, which in turn implies various good prop- 
erties of estimation and inference procedures based on the posterior. 
In the process we revisit and improve upon previous results for the 
exponential family under increasing dimension by making use of con- 
centration of measure. We also discuss a variety of applications in- 
cluding the multinomial model with moment restrictions, seemingly 
unrelated regression equations, and single structural equation mod- 
els. In our analysis, both the parameter dimension and the number 
of moments are increasing with the sample size. 

1. Introduction. The main motivation for this paper is to obtain large 
sample results for posterior inference in the curved exponential family under 
increasing dimension. Recall that in the exponential family, the log of a 
density is linear in parameters 8 € O; in the curved exponential family, 
these parameters 9 are restricted to lie on a curve rj i— ► 8(rj) parameterized 
by a lower dimensional parameter j) G There are many classical examples 
of densities that fall in the curved exponential family; see for example Efron 
\B\, Lehmann and Casella [15], and Bandorff-Nielsen pQ. Curved exponential 
densities have also been extensively used in applications [El Q31 E]. An 
example of the condition that puts a curved structure onto an exponential 
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family is a moment restriction of the type: 

J m(x,a)f(x,6)dx = 0, 

that restricts 6 to lie on a curve that can be parameterized as {6(r}), 7] e 
where component r] = (a, (5) contains a and other parameters (3 that are 
sufficient to parameterize all parameters 9 £ Q that solve the above equa- 
tion for some a. In econometric applications, often moment restrictions rep- 
resent Euler equations that result from the data being an outcome of an 
optimization by rational decision-makers; see e.g. Hansen and Singleton [9], 
Chamberlain [3j, Imbens [11] . and Donald, Imbens and Newey [5]. Thus, the 
curved exponential framework is a fundamental complement to the expo- 
nential framework, at least in certain fields of data analysis. 

Under high-dimensionality, despite of its applicability, theoretical proper- 
ties of the curved exponential family are not as well understood as the cor- 
responding properties of the exponential family. In this paper, we contribute 
to the theoretical analysis of the posterior inference in curved exponential 
families under high dimensionality. We provide sufficient conditions under 
which consistency and asymptotic normality of the posterior is achieved 
when both the dimension of the parameter space and the sample size are 
increasing, i.e large samples. Our framework only requires weak conditions 
on the prior distribution, which allows for improper priors. In particular, the 
uninformative prior always satisfies our assumptions. We also study the con- 
vergence of moments and the precisions with which we can estimate them. 
We then apply these results to a variety of models where both the parameter 
dimension and the number of moments are increasing with the sample size. 

The present analysis of the posterior inference in the curved exponential 
family builds upon the previous work of Ghosal [12] who studied posterior 
inference in the exponential family under increasing dimension. Under suf- 
ficient growth restrictions on the dimension of the model, Ghosal showed 
that the posterior distributions concentrate in neighborhoods of the true 
parameter and can be approximated by an appropriate normal distribution. 
Ghosal's analysis extended in a fundamental way the classical results of 
Portnoy [18] for maximum likelihood methods for the exponential family 
with increasing dimensions. 

In addition to a detailed treatment of the curved exponential family, we 
also establish some useful results for exponential families. In fact, we begin 
our analysis revisiting Ghosal's increasing dimension setup for the exponen- 
tial family. We present several results that complement Ghosal's results in 
several ways: First, we amend the conditions on priors to allow for a larger 
set of priors, for example, improper priors; second, we use concentration 
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inequalities for log-concave densities to sharpen the conditions under which 
the normal approximations apply; and third, we show that the approxima- 
tion of a-th order moments of the posterior by the corresponding moments 
of the normal density becomes exponentially difficult in the order a. 

The rest of the paper is organized as follows. In Section [2] we formally 
define the framework, assumptions, and develop results for the exponential 
family. In Section O the main section, we develop the results for the curved 
exponential family. In Section [5] we apply our results on a variety of ap- 
plications. Appendices O [Dl and [B] collect proofs of the main results and 
technical lemmas. 

2. Exponential Family Revisited. Assume that we are have a trian- 
gular array of random samples 

x (2) x m 

Y (n) Y (n) y(n) 

Assume further that each j.xj n ^ j ^ are independent -dimensional vec- 
tors draw from a -dimensional exponential family whose density is de- 
fined by 

(2.1) /(x;0 (n) ) =exp((x,e^)-^(e^)\ 

where 9^ 6 9^ an open convex set of IR rf< ' and tp^ is the associate nor- 
malizing function. Let 9^ £ denote the (sequence of) true parameter 
which is assumed to be bounded away from the boundary of (uniformly 
in n). Following Huber [?], for notational convenience we will suppress the 
superscript ( n ) but it is understood that the associate objects are changing 
with n. 

Under this framework, the posterior density of 9 given the observed data 
{Aj}™ =1 is defined as 
(2.2) 

m *(8)Ili=if(Xi;6) 7T(9)e W ((E 1 LiX l ,9)-nm) 
n[ ' $e<OYYUf(^m le T(Oexp((E?=i*i, 0-^(0K' 

where tt(-) ^= 7r^ n ^(-)^ denotes a prior distribution on O. As expected, we will 
need to impose some regularity conditions on the prior n. These conditions 
differ from the ones imposed in |12j . Although the same Lipschitz condition 
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is required, we require only a relative lower bound on the value of the prior 
on the true parameter instead of an absolute bound (see Theorem [1]). Such 
conditions allow for improper priors which were not allowed in [12]. In fact, 
the uninformative prior trivially satisfies our assumptions. 

Our results are stated in terms of a re-centered gaussian distribution in 
the local parameter space. Let fi = iI>'(0q) and F = tp"{6o) be the mean 
and covariance matrix associated with the random variables {^Q}, and let 
J = F 1 ' 2 be its square root (i.e., JJ T = F). The re-centering is defined as 
A n := y/KJ- 1 U Ya=i Xi-I*)l i1; follows that E[A n ] = 0, and E[A n A^] is 
the identity matrix of appropriate (increasing) dimension d. Moreover, the 
posterior in the local parameter space is defined for u G U = y/nJ(Q — 0q) 
as 

* M 7r(flo + n- 1 ' 2 J- l u) n? =1 f{X l ; 6 + n' 1 / 2 .r l u) 

1 ' ] fu K(e + n- V2 j-i u ) n - =1 fix,- e + n -V2 ■ 

In the same lines of Portnoy [18] and Ghosal [12] , conditions on the growth 
rates of the third and fourth moments are required. Therefore, the following 
quantities play an important role in the analysis: 



(2.4) B ln (c)=sup{E e \\(a,V)\ 3 ] : a G S^ 1 , \\J(0 — 9q)\\ 2 



(2.5) 



B 2n (c) = sup \E 6 (a,VT :a£S d -\\\J(9-8oW < — 



cd' 



n 



where V is a random variable distributed as J _1 (C/ — Eq[U]) and U has 
density f(-;6) as defined in (|2.ip . Moreover, a combination of (|2.4p and (|2.5p 
is key to bound deviations from normality of the posterior in a neighborhood 
of the true parameter: 



(2.6) 



A «( c ) ■=U\l-Bm(0) + -B 2n (c) 
bun n 



Note that A n (c) is different (in fact smaller) than the one defined in [12] . 
In Section [H we provide a sufficient condition under which we derive sharp 
bounds on A n (c). 

Next we state the main results of this section. 

Theorem 1 For any constant c > suppose that: 

(i) £ ln (c)yd7^0; 

(ii) A n (c)d ->• 0; 
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(tit) \\F- x \\d/n -> 0; 

(iv) the prior tt density satisfies: sup e In ^mK < 0(d), and 
\lmr(e)-lnn(e )\<K n (c)\\e-e \\ 

for any 9 such that \\9—9q\\ < y/\\F~ 1 \\cd/n. We require K n (c)y/^F~^ \\cd/n — 
0. 

Then we have asymptotic normality of the posterior density function, that 

is 

|7r*(u) - Mu; An, Id)\du -> p 0. 



As mentioned earlier, Theorem [T] has different assumptions on the prior 
that Theorem 3 of [12] has. On the other hand, Theorem Q] does not requires 
additional technical assumptions used in [12], as discussed in Appendix [Bj 
and the growth condition of d with relative to the sample size n is improved 
by at least Ind factors. 

In some applications it might be desired to have stronger convergence 
properties than simply asymptotic normality. The following theorem pro- 
vides sufficient conditions for the a-moment convergence. 

Theorem 2 For some sequence of a and d — > oo, let 

aln(d + a) 



M dia := (d + a) 1 + 



d + a 



Suppose that the following strengthening of assumptions (ii) and (iv) hold 
for any fixed c: 

l+a/2 



cM, 



d.a 



0; 



{ii') X n (cM dtCt /d) ^ 
(iv') K n (cM d Jd) V " F ' 1|l[5A H - 0. 



Then we have 

(2.7) j \\u\\ a \7T* n (u)-(j) d (u;A n ,I d )\du^ p Q. 

We emphasize that Theorem [2] allows for a and d to grow as the sample 
size increases. Our conditions highlight the polynomial trade off between n 
and d but an exponential trade off between n and a. This suggests that the 
estimation of higher moments in increasing dimensions applications could be 
very delicate. Conditions (it') and (iv') simplify significantly if a lnd = o(d), 
in such case we have M d a ~ d. 
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Comment 2.1 Suppose that we are interested to allow a grow with the 
sample size as well. If d is growing in a polynomial rate with respect to 
n, our results do not allow for a = O(lnn). Some limitation along these 
lines should be expected since there is an exponential trade off between a 
and n. However, it is definitely possible to let both the dimension and a to 
grow with the sample size with the rate a = 0(y/\nn). Such slow growth 
conditions illustrate the potential limitations for the practical estimation of 
higher order moments. 

3. Curved Exponential Family. Next we consider the case of a curved 
exponential family. Being a generalization of the canonical exponential fam- 
ily, its analysis has many similarities with the previous setup. 

Let X\ , X2 , ■ ■ ■ , X n be iid observations from a (i-dimensional curved ex- 
ponential family whose density function is given by 

f(x;6) = exp({x,0(ri))-W(v))), 

where r/ 6 $ C IR dl , 6 : ^ — ► 6, an open subset of IR d , and d — > 00 as 
n — > 00 as before. In this section we assume that J = for notational 
convenience. 

The parameter of interest is 77, whose true value 770 lies in the interior of 
the set ^ C IR^ 1 . The true value of 6 induced by 770 is given by 6>o = 0{t]q). 
The mapping rj 1— > 6{rj) takes values from TR dl to IR^ where c- d < d\ < d, for 
some c > 0. Moreover, assume that rjo is the unique solution to the system 
9( V ) = 9 . 

Thus, the parameter 6 corresponds to a high-dimensional linear parametriza- 
tion of the log-density, and rj describes the lower-dimensional parametriza- 
tion of the density of interest. We require the following regularity conditions 
on the mapping #(•). 

Assumption A. For every k, and uniformly in 7 G -6(0, K^/d), there 
exists a linear operator G : M dl — > JR d such that G'G has eigenvalues 
bounded from above and away from zero, and for every n 

(3.8) v 7 ^ + -r/Vn) ~ Hvo)) = rm + (I + #2„)G 7 , 

where ||ri n || < 5\ n and ||i?2n|| < <^2n- Moreover, those coefficients are 
such that 

(3.9) S ln d 1/2 and 5 2n d 0. 

Assumption B. There exist a strictly positive constants £0 such that 
for every 77 G * (uniformly on n) we have 

(3-10) P(V) ~ 9(vo)\\ > £o\\v ~ Vol 



POSTERIOR INFERENCE IN CURVED EXPONENTIAL 



7 




Fig 1. This figure illustrates the mapping #(•). The (discontinuous) solid line is the map- 
ping while the dash line represents the linear map induced by G. The dash-dot line repre- 
sents the deviation band controlled by r\„ and ifen- 



Thus the mapping ij \—* 9{rj) is allowed to be nonlinear and discontinuous. 
For example, the additional condition of 5\ n = implies the continuity of 
the mapping in a neighborhood of r]Q. More generally, condition (|3.9|) does 
impose that the map admits an approximate linearization in the neighbor- 
hood of r;o whose quality is controlled by the errors 5\ n and bin- An example 
of a kind of map allowed in this framework is given in the figure. 

Again, given a prior ir on O, the posterior of r] given the data is denoted 

by 

n 

7r n ( V ) oc n{6(Tj)) ■ 1] /(*<; V) = <0(ri)) • exp (n (X, 9(rj)) - m/>(0(r/))) 
i=i 

where X = ± £? =1 

Under this framework, we also define the local parameter space to describe 
contiguous deviations from the true parameter as 

7 = \/n(rj — rjo), and let s = (G' G)~ l G' yfn{X — n) 

(once more) be a first order approximation to the normalized maximum 
liklelihood/extremum estimate. Again, similar bounds hold for s: E[s] = 0, 
E[ss T ] = (G'G)^ 1 , and ||s|| = O p (y/d). The posterior density of 7 over T, 
where T = ^/n(* - r? ), is 7^(7) = j |^ , where 

(3.11) 

£(j) = exp (n (X, 9( m + n-^j) - 6( m )) - n [^(Vo + ^ 1/2 7)) - V^fao)) 
r/o + n~ 1/2 7 
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By construction we have 
1(7) 

ir{6 (^+„-l/2 7 )) 



Z n (n 1 ' 2 [0 (r? + n- 1 /^) - 0(r/ o )] ) = Z n (u 7 ) 



where u 7 G C JR d . 

Next we first show that tails have small mass outside a y^-neighborhood 
in r. We also need an additional condition on a n as defined in (1R271) and 
restated here for the reader's convenience 

a n = sup{c : A„(c) < 1/16}. 

Therefore, using LemmaEJ in a neighborhood of size y/a n d we can still bound 
Z n by above with a proper gaussian. In the next lemma it is required that 
logd = o{a n ) which is a substantially weaker condition than the one used 
in [12] for establishing asymptotic normality for the posterior of (regular) 
exponential densities, \ n {c\ogd)d = o(l). 

Lemma 1 Assume that (i),(ii), (Hi), and (iv) hold. In addition, suppose 
that logd = o(a n ). Then, for some constant k independent of d and d\, we 
have 



r\B(o,fcVrf) 



7T (Q{rio) + n 1 t 2 u~^j Z„(w 7 )d7 < o (^J n(o(r]Q)+n 1 ^ 2 m 7 ^ Z n (ur r )d'y 



Comment 3.1 The only assumption made on d\ in the previous lemma 
was that d\ < d. If d\ logd = o(d) the proof simplifies significantly (there is 
no need to define region (II)). 

Next we address the consistency question for the maximum likelihood 
estimator associated with the curved exponential family. 

Theorem 3 In addition to Assumptions A and B, suppose that a n — > oo, 
and (iv) hold. Then the maximum likelihood estimator fj satisfies 



\v~VoW = Op (Jd/n) . 



Two remarks regarding Theorem [3] are worth mention. First, a sufficient 
condition for a n — > oo is simply A n (c) — > 0, stronger than the condition 
y/ ' d/nBi n {c) needed for consistency for the exponential case obtained by 
Ghosal in [12]. Second, our consistency result relies on the dimension of the 
larger model d. 

Finally, we can state the asymptotic normality result for the curved ex- 
ponential family. 
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Theorem 4 Suppose that Assumptions A, B, (i), (ii), (Hi), and (iv) hold. 
In addition, suppose that \ogd = o{a n ). Then, asymptotic normality for the 
posterior density associated with the curved exponential family holds, 



K(7)-Mr,s,(G'G)- l )\d 7 . 



4. Controlling A Tl (c). In this section we derive a new bound on the 
fundamental quantity 



K(c) = UJ-B ln (0) + -B 2n (c) 
b \ V n n 



which plays a key role in bounding deviations from normality. We start by 
restating the following theorem for log-concave distributions. 

Theorem 5 (Lovasz and Vempala |16j) If X is a random vector from 
a log- concave distribution in TR, d then 



E 



\X\ 



l/k 



< 2kE\\\X\\] < 2kE 



\X\ 



1/2 



This result provides a reverse direction of the Holder inequality which will 
allow us to control higher moments based on the second moment. Since we 
will be bounding moments from random variables in the exponential family 
we can apply Theorem [3 

In what follows we consider 9 G IZ r = < 9 £ : II J -1 (6 — / 



-*o)||<V^ 

U ~ f = f(-,9), and let H e = E e [(U - E e [U)){U - E e [U})'} 1 / 2 . In this 
notation J = Hg . 

We first bound the third moment term Bi n (0). In this case, since the 
variable of interest (a, V) is properly normalized to have unit variance, its 
third moment is bounded by a constant. 

Lemma 2 (Bound on B\ n ) We have that Bi n (0) < 6 3 . 

Proof. Let V = J- l (U-E[U}) where U ~ f 9o . Therefore V has a lo gconcave 
density function, E[V] = 0, and £?[VV'] = Id- Using Theorem [5j we have 



Bin (0) < sup E 6o 
llo||=l 



(a,V) 



< Q A E 



(a,V) 



3/2 



6 : ' 



Before we proceed to bound the term l?2n in A n we state and prove the 
following technical lemma. 
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Lemma 3 Let X be a random vector in JR and M be a d x d matrix. We 
have that 

sup E \\ {a, MX) \ k ] < \\M\\ k sup E \\ {a,X) \ k 

\\a\\=l 1 J ||o||=l L 

Proof. Let a achieve the supremum on the left hand side. Then we have 



E 



(a, MX) 



E 



(M'd,X) 



\M'd\\ k E 



M'a 



\M'a 



a v 



M'a 



X 



< \\M\\ k S up M=1 E\\(a,X)\ k 



Unlike Lemma [H we need to bound the forth moment in a vanishing 
neighborhood of #o- This will require an additional assumption that Hg 
becomes sufficiently close to J for any in this neighborhood of 8q. This 
additional condition is c than the conditions of Theorem 2.4 in |12j . 

Lemma 4 Assume that \\I — Hg 1 J\\ < 1/2 (in the operator norm). Then 
we have that 

sup Eg \\ (a,V) \ k ] < 2 2k -k k . 
Nl=i L J 

Proof. By convexity of t i-> t k (k > 1) we have (t + s) k < 2 k ~ 1 (t k + s k ^j , 
and Lemma [3] yields 



su P|M|=i E I ( a i y ) 



= su P || || =1 Eg [| (a, {I - H-'OJ + H 9 l J)V 

< 2 fc - 1 SU P| | a| | =1 Eg 

+ 2 fe - 1 sup | | a| | =1 ^ 



a,(/-F e - 1 j)y 

(a^H^JV) \ k 



+ 



< 2 k - 1 \\I-H-Kj\\ k su VM=1 Eg \(a,V) 



+ 



+ 2 fc - 1 sup N | =1 ^[|(a,if - 1 jy\"' 
Using that \\I - Hg l J\\ < 1/2 we have 

sup Eg \\ (a,V) \ k ] <2 k sup E, 



r-l 



||a||=l L J ||a||=l 

Now we invoke Theorem [5] to obtain 



a,m l JV x lk 
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sup E e \\ (a, V) \ k ] < 2 k ■ (2k) k sup E g \\ (^H^JV 



a =1 



k/2 



2 2k . k k 



since E g \{Hq 1 JV)(H e l JV)' 



I P - 



Corollary 1 (Bound on I?2n( c )) Assume that \\I — H e 1 J|| < 1/2 (using 
the operator norm) for any 9 £ 1Z C . Then we have that i?2n( c ) — 2 16 . 

5. Applications. In this section we go over applications of both expo- 
nential and curved exponential families under increasing dimension. 

5.1. Multinomial Model. The first example we consider is the multino- 
mial distribution application which was also analyzed by Ghosal in |12j . 

Let X = {x°, x 1 , . . . ,x d } be the known finite support of a multinomial 
random variable X where d is allowed to grow with sample size n. For each 
i denote by pi the probability of the event {X = x 1 } which is assumed to 
satisfy maxj 1/pi = 0(d). The parameter space is given by = (9±, . . . , 9d) 
where 9i = log(pj/(l — J2j=iPj)) (under the assumption on the p^s the 
true value of #j's is bounded). The Fisher information matrix is given by 
F = P — pp' where P = diag(p). Using a rank-one update formula, we have 



(5.12) 



p_ x | p-yp- 1 
i — p i p~ i p 



p- x + 



Po 



Therefore we have ||P _1 || < trace(P~ 1 ) = Yd=i \ + < 0(d 2 ). It is also 
possible to derive an expression for J = F l l 2 and its inverse 



J = pl/2 _ 



PP 



fp-1/2 



1 + v/i-p'P-V 



and J- 1 = P- 1/2 



1 - p'P- 1 p + y/l-p'P-ip 



In order to bound A n we need to bound the third and fourth moments 
of a random variable which define B\ n and B>2 n - Let a G S"" -1 , and q be 
distributed as f{-;9). We have that 



(a, J 1 (q - P)) = ai Pi (* - Pi) + 



e'a 



i=i 



PO + V^f^l 



Pi) 



Under the assumption on the pi it can be shown that F>i n (c) = 0(d 3 ^ 2 ) and 
B 2n (c) = 0(p 2 ). 



12 



BELLONI AND CHERNOZHUKOV 



The relations above were derived by Ghosal in |12j , where the growth con- 
dition that d 6 (In d)/n -> was imposed to obtain the asymptotic normality 
results (the case of a = 0). We relax this growth requirement by combining 
Ghosal's approach with our analysis and an uninformative (improper) prior. 
In this case we have K n {c) = and our definition of X n remove the loga- 
rithmic factors. Therefore, Theorem [1] leads to a weaker growth condition 
in that it only requires that d 4 /n — > 0. Moreover, the results of Theorem 
2.4 of [12] now follow under the weaker growth condition that d 5 /n — > 0, 
replacing the previous growth condition that d e (log d)/n — ► 0. For higher 
moment estimation (a > 0) , the conditions of Theorem [2] are satisfied with 
the condition that d 4+a+s /n — ► for any strictly positive value of 5. 

5.2. Multinomial Model with Moment Restrictions. In this subsection 
we provide a high-level discussion of the multinomial model with moment 
restrictions. Let X = {x°,x % , x 2 , . . . , x d } be the known finite support of a 
multinomial random variable X which was described in Section [2J Conditions 
(i) — (iv) were verified in the same section. 

As discussed in the introduction, it is of interest to incorporate moment 
restrictions into this model, see Imbens for a discussion. This will lead 
to a curved exponential model as studied in Section El 

The parameter of interest is 77 G \I/ C IR dl a compact set. Consider a (twice 
continuously differentiable) vector-valued moment function m:^X$-» 
IR M such that 



The case of interest consists of the cardinality d of the support X being 
larger than the number of moment conditions M which in turn is larger 
than the dimension d% of the parameter of interest 77. The log-likelihood 
function associated with this model 



and l(q,rj) = —00 if violates any of the moments conditions. This log- 
likelihood function induces the mapping q : — > A d_1 formally defined 

as 



E[m(X, 77)] =0 for a unique 7/0 £ VP. 



n d 



(5.13) 




q(r)) = argmax l(q,rj) 



(5.14) 



d d 



Qj m (xj,v) = 0, U = 1 - °- 

j=0 j=l 
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As discussed in Section [2] the function Qj{rj) = log(qj(r]) /qo(i])) (for j = 
1, . . . ,d) is the natural #(•) : ^ — > mapping. Assuming that the matrix 
E [m(X, n)m(X, rj) 1 ] is uniformly positive definite over rj, Qin and Lawless 
|19j use the inverse function theorem to show that 9 is a twice continuous 
differentiable mapping of 77 in a neighborhood of r/Q. In particular this implies 
that Assumption A holds with S^n = and 5\ n = Oyddf(d/n)j . It suffices 

to have dr /n — > 0. 

In order to verify Assumption B, we use that the parameter 77 belongs in 
a compact set and assume that the mapping is injective (over a set that 
contains \P in its interior). We refer to Newey and McFadden [T7] for a dis- 
cussion of primitive assumptions for identification with moment restrictions. 

5.3. Multivariate Linear Model. Next we consider the multivariate model. 
The response variable y is a d r -dimensional vector, the disturbances u are 
normally distributed with mean zero and covariance matrix So- The covari- 
ates z are d c -dimensional and the parameter matrix of interest II is d c x d r , 



(5.15) yi = ZiIL + Ui i = l,...,n. 

For notational convenience, let Y and Z denote the matrix whose rows are 
given by and Z\ respectively. Note that the dimension of the model is 
d = d% + d c d r . 

This model can be cast as an exponential family model by the following 
parametrization 



(5.16) 9 





and using the (trace) inner product (9,X) = trace(X{#i) + trace (X^)- 
This parametrization leads to the normalizing function 

(5.17) ip{9) = -^-tiace(Z9 2 97 1 9' 2 Z') - - log det(-20i). 

An 2 

We make the following assumptions on the design. The covariates satisfy 
maxj< n \\zi\\ < 0{d l J 2 ), the smallest eigenvalue of is bounded away from 
zero, the eigenvalues of So are also bounded away from zero and from above, 
and the matrix II has full rank with smallest singular value also bounded 
away from zero. 
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The Fisher information matrix F associated with (15.17|) is such that for 
any direction 7 we have (using that 6\ is negative definite 
(5.18) 

1^7,7]! = "^trace (e^i 2 Z'Z l2 ) + ifrace (VS^S) " 

-^trace (^S^ 1 ^' WrVl) 

> ^i 9 ' 1 min{ X min ( ^ ) , X min (9i 1 ) , X min (#i~ 1 # 2 ^2#r 1 ) } 1 1 7 
= A min (S)min{A miri (^),2A™ n (S),4A min (n^n)}||7|| 2 . 

Under our assumptions, this implies that Hi* 1-1 )! = 0(1)- Since u ~ 
N(0, £), where £ is in a neighborhood of £0 m order to bound the third 
and fourth moments, we have B ln {c) = 0(4 /2 ) and B ln (c) = 0(d 2 r + d 2 c ). 

Therefore we have asymptotic normality by Theorem Q] provided that 
d^ = o(n) and d r d? c = o(n). 

5.4. Seemingly Unrelated Regression Equations. The seemingly unrelated 
regression model (Zellner [21]) considers a collection of d r models 

(5.19) y k = X k (3 k + u k , k=l,...,d r , 

each having n observations, and the dimension of (3 k is d k . Let d c denote 
the total number of distinct covariates. The d r -dimensional vector of dis- 
turbances u has zero mean and covariance So- This model can be written 
in the form of (|5.15p by setting II = [7Ti(/3i); 7r 2 (/3 2 ); • • • ;TT dr ((3 dr )]. Note 
that the vector 7Tj(/3j) has zeros for regressors that do not appear in the ith 
model. Garderen [20J shows that this model is a curved exponential model 
provided that the matrix II has some zero restrictions (that do not exclude 
any covariate from all models). 

Consider the same assumptions of Section 15. 151 In this case we have that 

Assume further a bounded support for r\. We restrict the space of £ to con- 
sider A m j n (£) > Xmin a fixed constant (note that this induces A max (S _1 ) < 
1 / Amm which leads to a convex region in the parameter space) , and that op- 
erator norm of n 2 is bounded by a constant, 1 1 772 1 1 2 ^2 < M a fixed constant 

larger than 1. (This imposes only boundedness assumptions on IT.) 

Note that the mapping #(•) is twice differentiable and can be shown that 
for any direction An = (A771; A772) we have 

|| V 2 % ) [At?, An] || < 2||Ar,i||||A^|| < \\An\\ 2 
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Therefore condition (j3.8|) holds with R 2n = and ||ri n || < 0(d/y/n). This 
implies that the requirement of d®. + d^df = o{n) suffices for Assumption A 
to hold. In order to verify Assumption B, we have 

||0(77) - 0(7/o)|| > min{||77i - 7701 ||,||77277i-77o 2 77oi ||}. 

By setting so = A mi - n /4M we can assume that \\rjx — 7701 1| < (A m j n /4M)||77 — 
770 1 1 which leads to \\r] 2 — 7702 1 1 > (1/2) 1 1 77 — 770 1| (otherwise Assumption B 
holds). In this case 

\\v2m - %2%ill = Wmim -%i) + ('?2-'?02)%i|| 

> \\V2 - ?702||A m i„(E ) - 1 1 771 - t7oi||||t7 2 ||2^2 

> £o||?7-?7o|| 

5.5. Single Structural Equation Model. Next we consider the single struc- 
tural equation, 

(5.21) y x = Y 2 (5 + Z ll + v 

for which the associated reduced form system can be partitioned as 



(5.22) 



(yi : Y 2 ) = (Z x : Z 2 ) 



TTll II12 

7T 2 i n 22 



+ {ui\U 2 ). 



We assume full column rank of Z and rank(7T2i : II22) = rank(II 2 2) = d r — 1 
(where d r is the dimension of (yi : Y 2 )). The compatibility between the 
models (]5.21|) and f)5.22|) requires that 

■Kit = Ui 2 (3 + 7, 7T 2 i = U 22 p, and u\ = U 2 f3 + v. 

The model can also be embedded in (15.150 as follows 
(5.23) 

\ 





( 


s- 1 




( 


f n i2 


m 




v 1I22 


\ % 1 




(j 




V 





, 0[ri) 



( 

V 



"2^ 



7 + ni 2 /? n 12 
n 22 /3 n 22 



/ 



/ 



Similar arguments to those used in Section [5.41 show that Assumptions A 
and B holds. 
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APPENDIX A: NOTATION 

For a, b G IR d , their (Euclidean) inner product is denoted by (a,b), and 
||a|| = yj{a, a). The unit sphere in IR d is denoted by = {v <E M d : 

\\v\\ = 1}. For a linear operator A, the operator norm is denoted by ||A|| = 
sup{||^4a|| : ||a|| = 1}. Let ^(-5 V) denote the d-dimensional gaussian 
density function with mean /i and covariance matrix V. 



APPENDIX B: TECHNICAL RESULTS 

In this section we prove the technical lemmas needed to prove our main 
result in the following section. Our exposition follows the work of Ghosal 
|12j . For the sake of completeness we include Proposition Q3 which can be 
found in Portnoy [18] , and a specialized version of Lemma 1 of Ghosal |12j . 
All the remaining proofs use different techniques and weaker assumptions 
which leads to a sharper analysis. In particular, we no longer require the 
prior to be proper, no bounds on the growth of det (tp"(9o)) are imposed, 
and In n and In d do not need to be of the same order. 

As mentioned earlier we follow the notation in Ghosal [12] , for u G IA let 



(B.24) 
(B.25) 

Z n (u) 



Z n {u) 



exp 



72 



and 



exp 



11 



'if, (eo + n-^J^u) -V(0o) 



otherwise (if #0 + tT 1 ! 2 J^ 1 u ^ O), let Z n (u) = Z n (u) = 0. The quantity 
(IB.24P denotes the likelihood ratio associated with / as a function of u. In 
a parallel manner, (|B.25P is associated with a standard gaussian density. 

We start recalling a result on the Taylor expansion of ip which is key to 
control deviations between Z(u) and Z(u). 

Proposition 1 (Portnoy [18j ) Let tp' and ip" denote respectively the gra- 
dient and the Hessian ofip. For any 9, 9q G O, there exists 9 = \9 + {1 — X)9q, 
for some A G [0, 1], such that 
(B.26) 

i>{B) = i>(d Q ) + {i>'(9 ),e-d Q ) + Ue-e Q ,ij"(d )(6-e )) + 



(9 - e , wy 



24 



En 



•-0o,wy- 



-3 E, 



(9-e ,wy 



where Eq [g{W)\ denotes the expectation of g(U — Eg [U]) with U ~ /(•; 9). 

Based on Proposition[T]we control the pointwise deviation between Z n and 
Z n in a neighborhood of zero (i.e., in a neighborhood of the true parameter). 
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Lemma 5 (Essentially in Ghosal [12J or Portnoy [18]) For all u such 
that \\u\\ < \fcd, we have 

\\nZ n {u)-\nZ n (u)\ < A n (c)||u|| 2 and h\Z n (u) < (A n , u)-^-\\u\\ 2 (l-2X n (c)) 



Proof. Under our definitions, (I) = \\n.Z n (u) — lnZ n (u)\ = n\^(0Q + 
n -1 / 2 J -1 -!/) — ip(9o)\. Using Proposition [T] we have that (J) is bounded above 
by 



(I) < n 



< 



(n-VaiMlSB^oj + n-iU 



4 B 2n (c)) < X n (c)\\u\\ 2 . 



The second inequality follows directly from the first result. ■ 

Next we show how to bound the integrated deviation between the quan- 
tities in (|B,24p and (|B.25p restricted to the neighborhood of zero. 



Lemma 6 For any c > we have 
_ -l 

Z n {u)du 



{u:\\u\\<Vcd} 



\Z n {u) - Z n {u)\du < cdX n {c)e cdX " {c) 



Proof. Using \e x — e y \ < \x — y\ m&x{e x ,e y } and Lemma [5] (since ||ii|| < Vcd) 
we have 

\Z n (u) - Z n (u)\ < \\nZ n (u) -lnZ n (u)|exp ((A n ,u) - |(1 - A n (c))||ii|| 2 ) 
< A n (c)||u|| 2 exp ((A n ,u> - ±(1 - A n (c))||«| 



By integrating over the set H(\/cd) = {u : \\u\\ < v cd} we obtain 



Z n {u)\du < / H( ^ ) A n (c)|| W || 2 exp((A n , u )-i(l-A n (c))|| U || 2 ) 

< cdX n (c) j^^exp ((A n ,u) - |(1 - A„(c))||w|| 2 ) 

< cdX n (c)e cd ^J HL ^- d) e^((A n ,u) - §||u|| 2 ) 

< cdX n {c)e cdx ^ J Z n {u)du. 



The next lemma controls the tail of Z n relatively to Z n . In order to achieve 
that it makes use of a concentration inequality for log-concave densities func- 
tions developed by Lovasz and Vempala in [16]. The lemma is stated with a 
given bound on the norm of A n which is allowed to grow with the dimension. 
Such bound on A n can be easily obtained with probability arbitrary close 
to one by standard arguments. 
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Lemma 7 Suppose that ||A n || 2 < C\d and A n (c) < 1/16. Then for every 
k > 1 we have 

_ TT{9o+n-^ 2 J- 1 tt)Z„(u)dM < (su P 7t(u)) f e cdA "^ f Z n {u)du] e~ k * d 

{u:||u||>Wctf} ^ u ' V J J 

where c > 16max{4Ci, 1/(1 - 2A n (c))}. 

Proof. Define H(a) := {u : ||u|| < a} and its complement by H{a) c . Then 
we have 

_ Tr(9 a +72- 1/2 J- 1 u)Z n (u)du < sup 7r(6» +?i~ 1/2 J^u) / _ Z n {u)du. 



Next note that A n E H{k\ cd). Moreover, for any u £ H{k\ cd) c we have 
some tt := A;\/cdit/||u|| such that 

\nZ n (u) < (A„,-u)-i||n|| 2 (l-2A n (c)) 

< k^dL^/a[d-\k 2 cd{l-2\ n {c)) 

Under our assumptions we have 

since c > 16 and assuming A n (c) < 1/4. Using Lemma 5.16 of |16j we have 

lH(kV^dr Z n(u)du < {e 1 - c c) d - 1 f Z n {u)du 

< 2{e l -~ c c) d - l $ H{Vrd) Z n {u)du 

< 2 (e l -~ c c) d - l e c P x ^ f H(V ^ Z n (u)du 

where we used that J Z n (u)du < 2 f H uj^h Z n (u)du (note that k does not 
appear). 

Since c > 16, we have m := c ^(1 — 1/d)^ — A n (c)^ — 1 (since we expect 
A n (c) -» and 1/d -> 0, m -» f - 1 > §). ■ 

We note that the value of c in the previous lemma could depend on n as 
long as the condition is satisfied. In fact, we can have c as large as 

(B.27) a n := sup{c : A n (c) < 1/16}. 

(IR271) charact erizes a neighborhood of size \/a n d on which the quantity 
Z n {-) can still be bounded by a proper gaussian. Lemma [7] bounds the con- 
tribution outside this neighborhood. We close this section with a technical 
lemma for bounding the difference between the expectation of a function 
with respect to two probability densities. 
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Lemma 8 Let f and g be two nonnegative integrable functions in TR d , and 
define If = f f(u)du and I g = J g(u)du. Moreover, let h be a third positive 
function and A C TR d be a set such that hf = L h(u)f(u)du < oo. Then 



h(u) 



/(«) 



9W 



du < 



max MS A h(u) + hf/If 



Proof. Simply note that 



Ja%) 



/(«) 
it 



g(") 



du 



< 



< 



\f(u) - g{u)\du. 



Ja%) 



it 



/a h(u)f(u)du + jjf A h{u)\f(u) - g{u)\du 
% + j-J A h( U )\f(u)-g(u)\du 



;a h(u) + hf/I f 



J A \f( u ) -g(u)\du. 



APPENDIX C: PROOF OF THEOREMS ?? AND ?? 

Armed with Lemmas O El [3 and we now show asymptotic normality 
and moments convergence results (respectively Theorems [T] and [5J under 
the appropriate growth conditions of the dimension of the parameter space 
with respect to the sample size. 

It is easy to see that Theorem Q] follows from Theorem [2] with a = 0, 
therefore its proof is omitted. 

Proof of Theorem^ Let M da := (d+a) (l + aln j d + a ) \ _ In the 

case that 

V d + a J 

a is constant and d grows to infinity, this simplifies to a multiple of d. We will 
be using that yJcM^ a > 4||A n || in the analysis (recall that [|A n [| = 0{\fd)\ 
We will divide the integral of (|2.7j) in two regions 

A = \u G IR d : ||n|| < ^cM d:Ct } and A c , 
where c is a fixed constant. Thus we have 



(C.28) 



H\ a \K( u ) ~ 4>d{u; A n , I d )\du < f A \\u\\ a \7rl(u) - <p d {u;A n ,I d )\du + 

+ Iac \\u\\ a \K( u ) ~ 4>d{u;A ni I d )\du. 

To bound the first term, we will use Lemma [8] with h(u) = \\u\\ a and A 
as defined above. In this case, we have hf/If < max uG A^('u) = c a l 2 M^^. 
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Therefore 



< , 2g ° /2Md °° J \n(6 + n- l / 2 J- l u)Z n {u) - 7r(8 )Z n (u)\dv 



J 7r(0o)Z n (u)du JA 

< 2c«/ 2 M d Q/ Q 2 sup MeA 



ir(9o+n 



- 1 



J Z n (u)du 



To bound the very last term we apply Lemma [6] with Cd t a = c ^d,a/d to 
obtain 



2^/2 tiW 2 „ 

r-*'" / l^(n)-Z n ( U )|dn<2 C - 1 W2 M jW2 An(cda)eC -M d , Q A„( Cd , Q ) 
J Z n (u)du J A 

which converges to zero under our assumption (ii 1 ). 

On the other hand, the first term is bounded by assumption (iv'). More- 
over, assumption (it;') also ensures that the term converges to zero as follows 



2c-/ 2 M?/ 2 sup 



7r(9 + n~ 1 / 2 J- 1 u) 



J A Z n (u)du 
J Z n (u)du 



< 



2c a l 2 M^ 2 e cMd ' a Xn ( Cd ' a ) 



\F-l\\cMj 



The second term of ()C.28|) is bounded above by 



/ Z n (u)du J a 



\u\\ a Z n {u)du+ 



J ir(9 )Z n (u)du Ja 



|M|| a 7r(6»o+n- 1/2 J^ 1 M)Z„(u)(iM. 



The first term above converges to zero by standard bounds on gaussian 
densities for an appropriate choice of the constant c (note that c can be 
chosen independently of d and a). 

Finally, we bound the last term. Let A£ := {u : \\u\\ £ [fc^cM^a, (k + 1) ^cMd^J }. 
Thus we have 
(C.29) 

f \\u\\ a n(8 Q +n- 1 / 2 J- 1 u)Z n (u)du < ^(fc+l) Q c a / 2 M^ 2 f Tr^o+rj- 1 / 2 J- x v)Z n (v)du 
Ja " fc=i ' J K 
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Using Lemma [7] for each integral we have 
Tr^o+n- 1 / 2 J- l u)Z n (u)du < (su P 7r( M )) [e^ M ^ n {c d , a ) j Zn (u)du^ e~ k5M ^/ 8 . 

Since M^ a > max{l,a} we have 

oo 

+ l} a e - ksM d, a /8 < e -cM d , a /10 

i=l 

by choosing c large enough. Moreover, our definition of M^ a also implies 
that 

fso/2 M a/2 p -cM dta /W ( a 



M Z e " ' = ex P ( 2 (ln5 + lnMd ' o) " 5ik W 10 ) ^ ex P (-cM diQ /20) 

provided that c is large enough. 

We have that (|C.29P can be bounded above by 

(supvr(n)) L^^M J Z n (u)di?j e - 5A W 20 = L(9 ) J Z n (u)du} 

under our assumptions. Therefore the result follows. □ 

APPENDIX D: PROOFS OF SECTION ?? 
Proof of Lemma [H Divide T into three regions: 

(I) := B (0, kN G ) , (II) :={7:max{|| 7 ||,|K||}<fciVT}\B(0jiV G ) ; 

(///) :=r\((/)u (//)), 

where k is chosen later to be large enough independent of the dimensions d 
or d\. Region (I) is defined to be the region where the linear approximation 
G for #(•) is valid in the sense of Assumption A. Region represents 
the tail of the distribution; either 7 or n 7 has large norm. Finally, region 
(II) is an intermediary region for which G is not a valid approximation but 
we still have interesting guarantees for deviations from normality. We point 
out that regions (II) or might be highly non-convex. We will derive 

sufficient conditions on the values of Nq and as a function of the d and 
d\. It will be sufficient to set Nq = yd and Nt = y/d\ogd. 

For notational convenience we define c G = k 2 N G /d and c T = k 2 N^/d. 
Our assumptions are such that 

d\ n (cQ) — > and A n (er) < 1/16. 
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We first bound the contribution of region (III). For any 7 G (HI), define 
= kNcj^j G Using Lemma [5] we have 

lnZ n (-u 7 ) < (A n ,u 7 ) - -(1 - 2A n (c G ))||u 7 || 2 

Since lnZ n (-) is concave in IA and lnZ n (0) = by design, we have 
(D.30) 

\nZ n ( Ul ) < ^|lnZ n (n 7 ) < -\\u 7 \\N G ^ 1 ~ 2 ^ Cg ) ^ < -|KI|iVG y 

by choosing A; large enough such that || A n || < ^kN G , and using that A n (c G ) < 
1/16. The contribution of (III) can be bounded by 



7T (0(770) + n" 1 / 2 ^) Z n (u 7 )d 7 < tt(0 o ) f sup / cxp f_i-iVGf||u 7 || > ) <f 7 . 

(777) v ' \e&e^WJ J(iii) V 5 / 



The integral on the right can be bounded as follows 
/(///) ex P (-T-^oKIl) d ^ ^ lB(o,W T )n(HI) ex P (-T^eKIl) ^7 

+ Js(0,iWV T )c ex P (-T^lltfcyll) ^7 

By definition of (III), 7 G B(0,N T ) n (III) implies ||« 7 || > fciV T . On the 
other hand 7 G B(0,Nt) implies that ||u 7 || > EoNt- A standard bound on 
the integrals yields 

J {III) exp (-f N G \\ Uj \\) d 1 < exp f-f N G N T + ^ ln(&V T )) + 

+ exp (- ^-e N G N T + di In di) . 

Using the assumption on the prior, we can bound the contribution of 
(III) by 

(D.31) 7r(6> )exp ^c prior d + d x In d x + d x hx(kN T ) - ^e Q N G N^\ • 

Next consider 7 G (II). By definition 7 G B(0,kN T ) \ B(0,kN G ). Under 
the assumption that A„(ct) < 1/16, we have that 

1 7 

lnZ„(u 7 ) < (A n ,n 7 ) - --||it 7 || 2 . 

Therefore, by choosing k such that fcA^g > 8[|A n ||, we have 

ir (6(r]o) +n~ 1/2 u 1 ) Z n {u 7 )d-y < w(9 ) ( sup -^p- ) / exp ( (A„, u 7 ) - i|||w 7 || 2 ] ^7 
v 7 \6&e^{%) J Jin) \ 2 8 J 



(77) 7 \eee n^oj/ J(77) 
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Again using our assumption on the prior and standard bounds to gaussian 
densities, we can bound the contribution of (II) by 

(D.32) ir(8 )exp(c pnor d + d x ln(l/e ) - -f ~k 2 N G j 

Finally, we show a lower bound on the integral over (I). First note that 
for any 7 G (I) condition (|3.9h holds and we have ii 7 = r ln + ( I + R 2n )G^. 
Therefore, u 7 G 5(0, (||G|| + S n2 )kN G + £ nl ) C B(0,2||G||jUV G ). For simplic- 
ity, let c (/ ) = 4||G|| 2 iV G /d. 

/(/) ^ (0(Vo) + ra~ 1/2 u 7 ) Z n (u 7 )dj > vr(6» )exp ^-K n (c (/ ))^/^^ Z n (u 7 )cf7 

Under our assumptions exp \ —K n (c^)\J -7^-^ — > 1- Furthermore, using 
(|3TT0jh ||A n || = 0(v / d), and ||7|| < fciV G , we have 

\nZ n (r ln + {I + R 2n )G-i) = (A n , r ln + (/ + R 2n )Gj) - 

- 1+2X t m) \\r ln + (I + R 2n )G^ 
> o(l) + (A n ,G7)- 1+2An 2 (c(/)) ||g7ll 2 - 

Therefore we have 

* (*(»*>) + «- 1/2 %) ZnK)d7 > ^o)0 exp ((A„, G 7 ) - 1+2A 2 " (P) ||G 7 || 2 ) <*y) 

> 7t(0 o )O ((1 - 2A„ ( C( /))) dl / 2 det(G'G)- 1 /2) 

> 7r(^ )O(exp(-||G||di)). 

Choosing N G = Vd, Nt = Vdlogd and k sufficiently large the result 
follows since we have d > d\. ■ 

Proof of Theorem [3l Let 7 be such that rj = % + n l / 2 ^j. We will show 
that lnZ(u 7 ) < — cd for any 7 ^ B(0,kVd) where k is sufficiently large. 
Therefore, since the contribution of the prior is bounded by (iv), the MLE 
7 G B(0, k^fd) and the result follows. 

Using (TTX301) with N G = Vd we have 

lnZ(n 7 ) < -\\u y \\Vdk 2 /5 < -e k 3 d/5. 

As stated earlier, the result follows by choosing k sufficiently large. ■ 
Proof of Theorem [4} Using Lemma [1] and known results for gaussian 

densities, we can restrict our analysis to B(0, k^fd) since the remaining part 

has negligible mass. 

The remaining of the proof follows the same steps in the proof of Theorem 

El 
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